ARTS: autonomous research topic selection system using word embeddings and network analysis
نویسندگان
چکیده
Abstract The materials science research process has become increasingly autonomous due to the remarkable progress in artificial intelligence. However, topic selection (ARTS) not yet been fully explored difficulty of estimating its promise and lack previous research. This paper introduces an ARTS system that autonomously selects potential topics are likely reveal new scientific facts have subject much by analyzing vast numbers articles. Potential selected difference between two concept networks constructed from information articles: one represents is word embeddings, known past activities statistical on appearance patterns concepts. also equipped with functions search visualize about assist final determination a scientist. We developed using approximately 100 00 articles published Computational Materials Science journal. results our evaluation demonstrated studied after 2016 could be generated analysis before 2015. suggests can effectively system.
منابع مشابه
A Correlated Topic Model Using Word Embeddings
Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, ...
متن کاملTopic Modeling Using Distributed Word Embeddings
We propose a new algorithm for topic modeling, Vec2Topic, that identifies the main topics in a corpus using semantic information captured via high-dimensional distributed word embeddings. Our technique is unsupervised and generates a list of topics ranked with respect to importance. We find that it works better than existing topic modeling techniques such as Latent Dirichlet Allocation for iden...
متن کاملTopic Modelling with Word Embeddings
English. This work aims at evaluating and comparing two different frameworks for the unsupervised topic modelling of the CompWHoB Corpus, namely our political-linguistic dataset. The first approach is represented by the application of the latent DirichLet Allocation (henceforth LDA), defining the evaluation of this model as baseline of comparison. The second framework employs Word2Vec technique...
متن کاملEnhancing Feature Selection Using Word Embeddings
Health surveillance systems based on online user-generated content often rely on the identification of textual markers that are related to a target disease. Given the high volume of available data, these systems benefit from an automatic feature selection process. This is accomplished either by applying statistical learning techniques, which do not consider the semantic relationship between the...
متن کاملTopic Sentiment Joint Model with Word Embeddings
Topic sentiment joint model is an extended model which aims to deal with the problem of detecting sentiments and topics simultaneously from online reviews. Most of existing topic sentiment joint modeling algorithms infer resulting distributions from the co-occurrence of words. But when the training corpus is short and small, the resulting distributions might be not very satisfying. In this pape...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine learning: science and technology
سال: 2022
ISSN: ['2632-2153']
DOI: https://doi.org/10.1088/2632-2153/ac61eb